Runtime system level fault tolerance for a distributed functional language
نویسندگان
چکیده
Distributed Fault Tolerance entails detecting errors, confining the damage caused, recovery from the errors, and providing continued service on a network of co-operating machines. Functional languages potentially offer benefits for distributed fault tolerance: many computations are pure, and hence have no side-effects to be reversed during error recovery. Moreover functional languages have a high-level runtime system (RTS) where computations and data are readily manipulated. We propose a new RTS level of fault tolerance for distributed functional languages, and outline a design for its implementation for the GdH language. Glasgow distributed Haskell is a small extension to the Haskell language and the fault tolerance design utilises existing distributed graph reduction mechanisms. The design distinguishes between pure and impure computations; impure or side effecting computations must be recovered using conventional exceptionbased techniques, but the RTS attempts implicit backward recovery of pure computations.
منابع مشابه
Transparent fault tolerance for scalable functional computation
Reliability is set to become a major concern on emergent large-scale architectures. While there are many parallel languages, and indeed many parallel functional languages, very few address reliability. The notable exception is the widely emulated Erlang distributed actor model that provides explicit supervision and recovery of actors with isolated state. We investigate scalable transparent faul...
متن کاملA Tool for Constructing Service Replication Systems
Service replication is a key to providing high availability, fault tolerance and good performance in distributed systems. However, building a service replication system is a di cult and complex task. This paper describes a tool that mimics the design of the remote procedure call (RPC) system to support building distributed service replication systems. The tool includes an interface de nition la...
متن کاملOperational Semantics for Declarative Networking
Declarative Networking has been recently promoted as a high-level programming paradigm to more conveniently describe and implement systems that run in a distributed fashion over a computer network. It has already been used to implement various networked systems, e.g., network overlays, Byzantine fault tolerance protocols, and distributed hash tables. Declarative Networking relies upon a rule-ba...
متن کاملRuntime Verification for Ultra-Critical Systems
Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs....
متن کاملThe HiPE/x86 Erlang Compiler: System Description and Performance Evaluation
Erlang is a concurrent functional language, tailored for large-scale distributed and fault-tolerant control software. Its primary implementation is Ericsson’s Erlang/OTP system, which is based on a virtual machine interpreter. HiPE (High-Performance Erlang) adds a native code execution mode to the Erlang/OTP system. This paper describes the x86 version of HiPE, including a detailed account of d...
متن کامل